Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Classification algorithm based on undersampling and cost-sensitiveness for unbalanced data
WANG Junhong, YAN Jiarong
Journal of Computer Applications    2021, 41 (1): 48-52.   DOI: 10.11772/j.issn.1001-9081.2020060878
Abstract384)      PDF (752KB)(664)       Save
Focusing on the problem that the minority class in the unbalanced dataset has low prediction accuracy by traditional classifiers, an unbalanced data classification algorithm based on undersampling and cost-sensitiveness, called USCBoost (UnderSamples and Cost-sensitive Boosting), was proposed. Firstly, the majority class samples were sorted from large weight sample to small weight sample before base classifiers being trained by the AdaBoost (Adaptive Boosting) algorithm in each iteration, the majority class samples with the number equal to the number of minority class samples were selected according to sample weights, and the weights of majority class samples after sampling were normalized and a temporary training set was formed by these majority class samples and the minority class samples to train base classifiers. Secondly, in the weight update stage, higher misclassification cost was given to the minority class, which made the weights of minority class samples increase faster and the weights of majority class samples increase more slowly. On ten sets of UCI datasets, USCBoost was compared with AdaBoost, AdaCost (Cost-sensitive AdaBoosting), and RUSBoost (Random Under-Sampling Boosting). Experimental results show that USCBoost has the highest evaluation indexes on six sets and nine sets of datasets under the F1-measure and G-mean criteria respectively. The proposed algorithm has better classification performance on unbalanced data.
Reference | Related Articles | Metrics